Client-side monitoring for Web mining

نویسندگان

  • Kurt D. Fenstermacher
  • Mark Ginsburg
چکیده

“Garbage in. garbage out” is a well-known phrase in computer analysis, and one that comes to mind when mining Web data to draw conclusions about Web users. The challenge is that data analysts wish to infer patterns of client-side behavior from server-side data. However, because only a fraction of the user’s actions ever reaches the Web server, analysts must rely on incomplete data. In this paper, we propose a client-side monitoring system that is unobtrusive and supports flexible data collection. Moreover, the proposed framework encompasses client-side applications beyond the Web browser. Expanding monitoring beyond the browser to incorporate standard office productivity tools enables analysts to derive a much richer and more accurate picture of user behavior on the Web. Imagine a library where the librarians know not only which books have been checked out, but also which books patrons have pulled from the shelves. For each book pulled from the shelf, the library staff knows whether the reader simply skimmed the text and returned it to the shelf, or carefully read two of the chapters. If the copier were used to copy pages from a journal, the staff would know which issues and which pages were copied and how they were used. With such information, the staff could better judge how the collection was being used, which areas could be improved and more. In short, the staff could help people become better library users. Such monitoring is not possible in the world of printed texts, but it is possible in the online world. In cyberspace, many people access information through the World Wide Web, making the Web a candidate for improved monitoring. Unfortunately, just as many libraries only know which works are checked out, very little information is available about how people use the Web. Today, Web server logs are the primary source of data for mining. Server logs are thus the base from which analysts draw inferences about user behavior. Although the problems of server log analysis are well known, there has not been extensive work exploring user monitoring at the client. In this paper, we summarize the shortcomings of server-side data, and propose a framework for client-side Web monitoring. Monitoring at the Web browser level enables much richer data collection, supporting better analysis. Although client-side monitoring schemes have been proposed before, our proposed framework is extensible to other client applications. With broader monitoring on the client side, analysts can view Web usage within the overall client context. In the following section, we survey current server-side analysis, describing the collection of the data and its shortcomings for inferring Web user behavior. After summarizing the current state of the art, we summarize the goals of a client-based monitoring framework. With the goals specified, we describe a framework achieves the goals for users who work with common applications on the Microsoft Windows platforms. After presenting a framework for client-side data gathering, we discuss some novel analyses that draw on the richer knowledge of the client. We focus on the potential impact of integrating client-side monitoring of Web access with other client-side applications. Finally, we conclude by describing future directions for the framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Client-Side Activity for Personalization

“Garbage in. garbage out” is a well-known phrase in computer analysis, and one that comes to mind when mining Web data to draw conclusions about Web users. The challenge is that data analysts wish to infer patterns of client-side behavior from server-side data. However, because only a fraction of the user’s actions ever reach the Web server, analysts must rely on incomplete data. In this paper,...

متن کامل

A Framework for Personal Web Usage Mining

In this paper, we propose to mine Web usage data on client side, or personal Web usage mining, as a complement to the server side Web usage mining. By mining client side Web usage data, more complete knowledge about Web usage can be obtained. A framework for personal Web usage mining is proposed. Some related issues and applications of personal Web usage mining

متن کامل

Script-based System for Monitoring Client-side Activity

In this paper, a system for monitoring client-side activity is described. The system is capable of monitoring a wide range of user actions performed in web browser environment. Event logs are sent to the web server for further processing. The described system is completely script-driven on the client side, requiring no additional software to be installed on the client system and guaranteeing pl...

متن کامل

Privacy-Preserving History Mining for Web Browsers

We introduce a new technique that permits servers to harvest selected Internet browsing history from visiting clients. Privacy-Preserving History Mining (PPHM) requires no installation of special-purpose client-side executables. Paradoxically, it exploits a feature in most browsers (IE, Firefox and Safari) regarded for years as a privacy vulnerability. PPHM enables privacy-preserving data-minin...

متن کامل

Issues of Learning the Browsing Language

The web is pervading all walks of life and its huge increase in information volume has made the web personalization mandatory. Web Personalization may be achieved by web mining especially the web usage mining technique on the surfing behavior. Learning the surfing behavioral pattern has emerged into a promising research area to achieve web personalization. Till recently web usage mining was don...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIST

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2003